智能论文笔记

Learning Individual Policies in Large Multi-agent Systems through Local Variance Minimization

Tanvi Verma , Pradeep Varakantham

分类：人工智能

2022-12-27

In multi-agent systems with large number of agents, typically the contribution of each agent to the value of other agents is minimal (e.g., aggregation systems such as Uber, Deliveroo). In this paper, we consider such multi-agent systems where each agent is self-interested and takes a sequence of decisions and represent them as a Stochastic Non-atomic Congestion Game (SNCG). We derive key properties for equilibrium solutions in SNCG model with non-atomic and also nearly non-atomic agents. With those key equilibrium properties, we provide a novel Multi-Agent Reinforcement Learning (MARL) mechanism that minimizes variance across values of agents in the same state. To demonstrate the utility of this new mechanism, we provide detailed results on a real-world taxi dataset and also a generic simulator for aggregation systems. We show that our approach reduces the variance in revenues earned by taxi drivers, while still providing higher joint revenues than leading approaches.

translated by 谷歌翻译

Towards Soft Fairness in Restless Multi-Armed Bandits

Dexun Li , Pradeep Varakantham

分类：机器学习 | 人工智能

2022-07-27

躁动不安的多臂土匪（RMAB）是在不确定性下分配有限资源的框架。这是一个非常有用的模型，用于监测受益人和执行及时的干预措施，以确保在公共卫生环境中获得最大的利益（例如，确保患者在结核病环境中服用药物，确保怀孕的母亲听取有关良好怀孕习惯的自动电话）。由于资源有限，通常某些社区或地区会饿死可能带来后续影响的干预措施。为了避免在个人/地区/社区的执行干预措施中饥饿，我们首先提供了软性约束，然后提供了一种方法来强制RMAB中的软性公平约束。柔软的公平约束要求，如果选择后一个臂的长期累积奖励较高，则算法永远不会在概率上偏爱另一只手臂。我们的方法将基于SoftMax的价值迭代方法在RMAB设置中纳入设计选择算法，以满足提出的公平约束。我们的方法（称为Softfair）也提供了理论性能保证，并且在渐近上是最佳的。最后，我们证明了我们在模拟基准上的方法的实用性，并证明可以在没有重大牺牲的价值牺牲的情况下处理软性公平约束。

translated by 谷歌翻译

Efficient Resource Allocation with Fairness Constraints in Restless Multi-Armed Bandits

Dexun Li , Pradeep Varakantham

分类：机器学习

2022-06-08

躁动不安的多臂土匪（RMAB）是一种恰当的模型，可以代表公共卫生干预措施（例如结核病，母性和儿童保育），反偷猎计划，传感器监测，个性化建议等方面的决策问题。 RMAB的现有研究为各种环境提供了机制和理论结果，其中重点是最大化期望值。在本文中，我们有兴趣确保RMAB决策对不同的武器也很公平，同时最大化了预期价值。在公共卫生环境的背景下，这将确保在做出公共卫生干预决策时公平地代表不同的人和/或社区。为了实现这一目标，我们正式定义了RMAB中的公平限制，并提供计划和学习方法以公平的方式解决RMAB。我们证明了公平RMAB的关键理论特性，并在实验上证明了我们所提出的方法处理公平限制，而无需在溶液质量上显着牺牲。

translated by 谷歌翻译

Conditional Expectation based Value Decomposition for Scalable On-Demand Ride Pooling

Avinandan Bose , Pradeep Varakantham

分类：机器学习 | 人工智能

2021-12-01

由于客户的好处（较低的价格），司机（更高收入），聚合公司（更高的收入）和环境（较少的车辆），按需乘坐游泳池（例如，优步池，抓取股份）变得非常受欢迎。匹配车辆与请求组合的显着计算复杂性意味着传统的乘坐汇集方法是近视，因为它们不考虑当前匹配对车辆/驱动程序的未来价值的影响。最近，神经近似动态编程（Neuradp）就使用了具有近似动态编程（ADP）的值分解来优于考虑各个代理（车辆）所选择的行动对该代理的未来价值的影响。但是，为了确保可扩展性和促进城市规模的乘坐汇集，Neuradp完全忽略了其他代理行为对个别代理/车辆价值的影响。正如我们实验结果所示，忽略其他代理对个人价值的行为的影响可能会对整体性能产生重大影响，因为当需求增加车辆之间的竞争时。我们的主要贡献是基于通过联合条件概率计算条件期望的新机制，以便在不增加培训或决策的复杂性的情况下捕获对其他代理行动的依赖性。我们表明，我们的新方法，条件基于期望的价值分解（CEVD）在服务的整体请求方面优先于Neuradp高达9.76％，这在城市宽的基准列表数据集中是一个重要的改进。

translated by 谷歌翻译

Interactive Concept Bottleneck Models

Kushal Chauhan , Rishabh Tiwari , Jan Freyberg , Pradeep Shenoy , Krishnamurthy Dvijotham

分类：机器学习 | 人工智能

2022-12-14

Concept bottleneck models (CBMs) (Koh et al. 2020) are interpretable neural networks that first predict labels for human-interpretable concepts relevant to the prediction task, and then predict the final label based on the concept label predictions.We extend CBMs to interactive prediction settings where the model can query a human collaborator for the label to some concepts. We develop an interaction policy that, at prediction time, chooses which concepts to request a label for so as to maximally improve the final prediction. We demonstrate thata simple policy combining concept prediction uncertainty and influence of the concept on the final prediction achieves strong performance and outperforms a static approach proposed in Koh et al. (2020) as well as active feature acquisition methods proposed in the literature. We show that the interactiveCBM can achieve accuracy gains of 5-10% with only 5 interactions over competitive baselines on the Caltech-UCSDBirds, CheXpert and OAI datasets.

translated by 谷歌翻译

Selective classification using a robust meta-learning approach

Nishant Jain , Pradeep Shenoy

分类：机器学习

2022-12-12

Selective classification involves identifying the subset of test samples that a model can classify with high accuracy, and is important for applications such as automated medical diagnosis. We argue that this capability of identifying uncertain samples is valuable for training classifiers as well, with the aim of building more accurate classifiers. We unify these dual roles by training a single auxiliary meta-network to output an importance weight as a function of the instance. This measure is used at train time to reweight training data, and at test-time to rank test instances for selective classification. A second, key component of our proposal is the meta-objective of minimizing dropout variance (the variance of classifier output when subjected to random weight dropout) for training the metanetwork. We train the classifier together with its metanetwork using a nested objective of minimizing classifier loss on training data and meta-loss on a separate meta-training dataset. We outperform current state-of-the-art on selective classification by substantial margins--for instance, upto 1.9% AUC and 2% accuracy on a real-world diabetic retinopathy dataset. Finally, our meta-learning framework extends naturally to unsupervised domain adaptation, given our unsupervised variance minimization meta-objective. We show cumulative absolute gains of 3.4% / 3.3% accuracy and AUC over the other baselines in domain shift settings on the Retinopathy dataset using unsupervised domain adaptation.

translated by 谷歌翻译

Learning on non-stationary data with re-weighting

Nishant Jain , Pradeep Shenoy

分类：机器学习

2022-12-12

Many real-world learning scenarios face the challenge of slow concept drift, where data distributions change gradually over time. In this setting, we pose the problem of learning temporally sensitive importance weights for training data, in order to optimize predictive accuracy. We propose a class of temporal reweighting functions that can capture multiple timescales of change in the data, as well as instance-specific characteristics. We formulate a bi-level optimization criterion, and an associated meta-learning algorithm, by which these weights can be learned. In particular, our formulation trains an auxiliary network to output weights as a function of training instances, thereby compactly representing the instance weights. We validate our temporal reweighting scheme on a large real-world dataset of 39M images spread over a 9 year period. Our extensive experiments demonstrate the necessity of instance-based temporal reweighting in the dataset, and achieve significant improvements to classical batch-learning approaches. Further, our proposal easily generalizes to a streaming setting and shows significant gains compared to recent continual learning methods.

translated by 谷歌翻译

AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization

Bhargavi Paranjape , Pradeep Dasigi , Vivek Srikumar , Luke Zettlemoyer , Hannaneh Hajishirzi

分类：机器学习 | 人工智能 | 自然语言处理

2022-12-02

Models trained via empirical risk minimization (ERM) are known to rely on spurious correlations between labels and task-independent input features, resulting in poor generalization to distributional shifts. Group distributionally robust optimization (G-DRO) can alleviate this problem by minimizing the worst-case loss over a set of pre-defined groups over training data. G-DRO successfully improves performance of the worst-group, where the correlation does not hold. However, G-DRO assumes that the spurious correlations and associated worst groups are known in advance, making it challenging to apply it to new tasks with potentially multiple unknown spurious correlations. We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization -- an end-to-end approach that jointly identifies error-prone groups and improves accuracy on them. AGRO equips G-DRO with an adversarial slicing model to find a group assignment for training examples which maximizes worst-case loss over the discovered groups. On the WILDS benchmark, AGRO results in 8% higher model performance on average on known worst-groups, compared to prior group discovery approaches used with G-DRO. AGRO also improves out-of-distribution performance on SST2, QQP, and MS-COCO -- datasets where potential spurious correlations are as yet uncharacterized. Human evaluation of ARGO groups shows that they contain well-defined, yet previously unstudied spurious correlations that lead to model errors.

translated by 谷歌翻译

Data-Efficient Finetuning Using Cross-Task Nearest Neighbors

Hamish Ivison , Noah A. Smith , Hannaneh Hajishirzi , Pradeep Dasigi

分类：自然语言处理

2022-12-01

Language models trained on massive prompted multitask datasets like T0 (Sanh et al., 2021) or FLAN (Wei et al., 2021a) can generalize to tasks unseen during training. We show that training on a carefully chosen subset of instances can outperform training on all available data on a variety of datasets. We assume access to a small number (250--1000) of unlabeled target task instances, select their nearest neighbors from a pool of multitask data, and use the retrieved data to train target task-specific models. Our method is more data-efficient than training a single multitask model, while still outperforming it by large margins. We evaluate across a diverse set of tasks not in the multitask pool we retrieve from, including those used to evaluate T0 and additional complex tasks including legal and scientific document QA. We retrieve small subsets of P3 (the collection of prompted datasets from which T0's training data was sampled) and finetune T5 models that outperform the 3-billion parameter variant of T0 (T0-3B) by 3--30% on 12 out of 14 evaluation datasets while using at most 2% of the data used to train T0-3B. These models also provide a better initialization than T0-3B for few-shot finetuning on target-task data, as shown by a 2--23% relative improvement over few-shot finetuned T0-3B models on 8 datasets. Our code is available at https://github.com/allenai/data-efficient-finetuning.

translated by 谷歌翻译

A physics-aware deep learning model for energy localization in multiscale shock-to-detonation simulations of heterogeneous energetic materials

Phong C. H. Nguyen , Yen-Thi Nguyen , Pradeep K. Seshadri , Joseph B. Choi , H. S. Udaykumar , Stephen Baek

分类：机器学习

2022-11-08

Predictive simulations of the shock-to-detonation transition (SDT) in heterogeneous energetic materials (EM) are vital to the design and control of their energy release and sensitivity. Due to the complexity of the thermo-mechanics of EM during the SDT, both macro-scale response and sub-grid mesoscale energy localization must be captured accurately. This work proposes an efficient and accurate multiscale framework for SDT simulations of EM. We employ deep learning to model the mesoscale energy localization of shock-initiated EM microstructures upon which prediction results are used to supply reaction progress rate information to the macroscale SDT simulation. The proposed multiscale modeling framework is divided into two stages. First, a physics-aware recurrent convolutional neural network (PARC) is used to model the mesoscale energy localization of shock-initiated heterogeneous EM microstructures. PARC is trained using direct numerical simulations (DNS) of hotspot ignition and growth within microstructures of pressed HMX material subjected to different input shock strengths. After training, PARC is employed to supply hotspot ignition and growth rates for macroscale SDT simulations. We show that PARC can play the role of a surrogate model in a multiscale simulation framework, while drastically reducing the computation cost and providing improved representations of the sub-grid physics. The proposed multiscale modeling approach will provide a new tool for material scientists in designing high-performance and safer energetic materials.

translated by 谷歌翻译